%
% Section for Lessons Learned
%
\section{Lessons Learned}
\label{sec:LessonsLearned}


% Subsection title
\subsection{Event Importance}

In order to improve the precision of the results returned by ChronoSearch, a technique was attempted in order to remove events that were deemed unimportant. The guiding principle behind this approach was that the Web caters to user interest by creating content users are interested in. Therefore, the more popular an event description is on the Web, the more important the event is from the perspective of the end user. This also means that the more popular the event is, the more likely it is to be included in the ChronoSearch user's expected results. The idea is to leverage the ranking algorithms provided via current search engines and to make use of their results in order to judge the importance of an event description returned by our system.

The idea behind this concept was to remove unpopular events as determined by search engine result counts for a query performed on the event description itself. The search result counts for events would hopefully align with the relative importance of the events themselves. This would allow for events that had search result counts that fell below some reasonable level to be removed from the result events returned by ChronoSearch.

This precision improvement mechanism was implemented in our system by using the Bing search engine to query for the event descriptions themselves. Many different techniques were utilized for entering the event descriptions into the search engine. One technique was to search the event description verbatim as found by the system, with no modifications. Another technique was to remove any operators utilized by the Bing search engine, which mainly included Boolean operators and to then search the event description. Another technique was to remove any quotation characters from the result, then to surround the sentence with quotations and search on that quoted sentence. The previously described methods for searching the result sentences via an existing search engine all had problems that inhibited the ability of ChronoSearch to make use of this mechanism for pruning unimportant events.

When searching for the event sentences verbatim, several problems arise. One such problem is the fact that Boolean operators exist in some result sentences, and performing a search with a logical "OR" greatly skews the results. Consider the following sentence: \begin{quote}\textit{Quick Read \textbar Comments \textbar 11.01.2011 Women In Tech: Why Is There No Female Steve Jobs?}\end{quote} 
When performing a search on this sentence, an atrociously high amount of search results are returned due to the logical "OR" operator contained in the sentence. This sentence returned 1,480,000,000 search results by the Bing search engine. This number of search results is nowhere near accurate for the mentioning of Jobs in the sentence and misleads the ChronoSearch system to believe that this sentence is very important, which is clearly not valid. The problem created by Boolean operators led to the second technique for searching on event sentences, which was to remove any Boolean operators used by the search engine.

The second approach removed Boolean operators from the results  before searching on them, such as a logical AND, OR, and NOT as well as others. Searching for result sentences after removing search engine operators did not correctly identify important results either. One problem was due to the fact that quotes existed in candidate sentences and therefore lowered the search count since only specific sentence fragments were being searched for verbatim. Also, very short sentences that contained few words tended to have very high search result counts due to the fact that they mentioned Steve Jobs. This was true even if the sentence contained no other useful information. Consider the following sentence as an example: \begin{quote}\textit{\textasciicircum Jobs, Steve (January 5, 2009).}\end{quote} 
This sentence yielded a search result count of 51,000,000, which was a relatively high count for this approach, and therefore this event would be deemed very important. This is clearly not true due to the fact that nothing is mentioned in the event description. 

The last approach removed quotations from the candidate sentence then searched for the whole sentence in quotes. This obviously is not sufficient as long sentences would have very low search result counts often despite the importance of the event. This approach also depends on how well the sentence extraction methods work and how clean the result sentences are. Any extraneous characters or run-on sentences can throw off the search for candidate sentences. Consider the following sentence as an example: \begin{quote}\textit{This is a prepared text of the Commencement address delivered by Steve Jobs, CEO of Apple Computer and of Pixar Animation Studios, on June 12, 2005.}\end{quote} 
This sentence received a search result count of 0, which is not an accurate depiction of the importance of this event. Searching for result sentences absolutely, hence by searching for them surrounded by quotes, is not an effective mechanism to determine importance either.

Our initial assumption was that we could leverage existing search engines to do importance ranking for us. However, we were  unable to utilize an effective method to prove this assumption. In order to prove that our assumptions  were incorrect, we  showed that no correlation existed between event importance and search engine result counts by our attempted methods. For this proof, we ranked 80 result sentences in order of level of importance. The rankings started at 0, which meant not important at all, and went up to 3, which meant the event was very important. So, the total 80 event sentences were manually ranked from 0 to 3. Then, we attained the search result counts for each of these 80 event sentences as achieved by entering them into the Bing search engine. In this evaluation, we removed Boolean operators from the event sentence before searching on that event. The results of this test can be seen in Figure ~\ref{fig:EventImportance}.

%
% Chart for the Event Importance
%
\begin{figure*}
\caption{Event Importance}
\label{fig:EventImportance}
\includegraphics[width=160mm]{EventImportance.png}
\end{figure*}

One can see that the figure proves that there is no correlation between event importance and search engine result counts with respect to the methods implemented in ChronoSearch. There are several rank 0 results that had very large search result counts, which is counter-intuitive. Also, many rank 3 events returned very low event search result counts. For these reasons and the general inconsistency of the graph, no correlation can be made between event importance and search engine result counts. Our initial assumption that this method could improve the precision of our results was proven incorrect.
